Fast keyword detection using suffix array
نویسندگان
چکیده
In this paper, we propose a technique for detecting keywords quickly from a very large speech database without using a large memory space. To accelerate searches and save memory, we used a suffix array as the data structure and applied phoneme-based DP-matching. To avoid an exponential increase in the process time with the length of the keyword, a long keyword is divided into short sub-keywords. Moreover, an iterative lengthening search algorithm is used to rapidly output accurate search results. The experimental results show that it takes less than 100ms to detect the first set of search results from a 10,000-h virtual speech database.
منابع مشابه
Acceleration of spoken term detection using a suffix array by assigning optimal threshold values to sub-keywords
We previously proposed a fast spoken term detection method that uses a suffix array data structure for searching large-scale speech documents. The method reduces search time via techniques such as keyword division and iterative lengthening search. In this paper, we propose a statistical method of assigning different threshold values to sub-keywords to further accelerate search. Specifically, th...
متن کاملEvaluation of Fast Spoken Term Detection Using a Suffix Array
We previously proposed [1] fast spoken term detection that uses a suffix array as a data structure for searching a largescale speech documents. In this method, a keyword is divided into sub-keywords, and the phoneme sequences that contain two or more sub-keywords are output as results. Although the search is executed very quickly on a 10,000-h speech database, we only proposed a variety of matc...
متن کاملUsing Multiple Speech Recognition Results to Enhance STD with Suffix Array on the NTCIR-10 SpokenDoc-2 Task
We have previously proposed a fast spoken term detection method that uses a suffix array as a data structure. By applying dynamic time warping on a suffix array, we achieved very quick keyword detection from a very large-scale speech document. In this study, we modify our method so that it can deal with multiple recognition results. By using these results obtained from various speech recognizer...
متن کاملUtilizing Confusion Network in the STD with Suffix Array and Its Evaluation on the NTCIR-11 SpokenQuery & Doc SQ-STD Task
The authors have proposed a fast spoken term detection that uses a suffix array as a data structure. This method enables very quick and memory saving search by using such techniques as keyword division, dynamic time warping, and employment of articulatoryfeature-based local distance definition. In this paper, we investigate a new approach that utilizes a confusion network in the suffix array. T...
متن کاملUtilization of Suffix Array for Quick STD and Its Evaluation on the NTCIR-9 SpokenDoc Task
We propose a technique for detecting keywords quickly from a very large speech database without using a large-sized memory. For acceleration of search and saving the use of memory, we employed a suffix array as a data structure and applied phonemebased DP-matching to it. To avoid exponential explosion of process time with the length of a keyword, a long keyword is divided into short sub-keyword...
متن کامل